Compression of line spectral frequency parameters using the asynchronous interpolation model

نویسندگان

  • Alexander Kain
  • Todd K. Leen
چکیده

We apply an asynchronous interpolation model (AIM) to line spectral frequency trajectories. AIM represents speech transition features as crossfading between basis vector features, governed by individual interpolation weights per feature component. Basis vectors are initialized from demiphone labels, and then optimized using a local reconstruction error. Using a small diphone acoustic inventory, we reduce the number of parameters by using dimensionreduced latent space weights and a vector quantized pool of basis vectors. The highest compression rate of 1:11 resulted in a log spectral distortion of 4.83 dB.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unit-selection text-to-speech synthesis using an asynchronous interpolation model

We describe the Asynchronous Interpolation Model, which represents speech as a composition of several different types of feature streams that are computed using asynchronous interpolation of neighboring basis vectors, according to transition weights. When applied to the acoustic inventory of a concatenative Text-to-Speech synthesizer, the model eliminates concatenation errors and affords opport...

متن کامل

Interpolation properties of linear prediction parametric representations

In this paper, interpolation of linear predictive coding (LPC) parameters in terms of the following representations is investigated: linear prediction coefficient representation, reflection coefficient representation, log-arearatio representation, arc-sine reflection coefficient representation, cepstral coefficient representation, line spectral frequency representation, autocorrelation coeffici...

متن کامل

Low Resource TTS Synthesis Based on Cepstral Filter with Phase Randomized Excitation

In this paper we present the acoustic synthesis of a low resource Text-To-Speech (TTS) system based on a 7th order cepstral filter. The excitation signal is designed in frequency domain by a two parameter model. This model is able to generate the excitation signal for both, voiced and unvoiced segments. The sets of filter coefficients represent the speech units and are stored in a compressed fo...

متن کامل

Robust Transmission of Speech LSFs Using Hidden Markov Model-Based Multiple Description Index Assignments

Speech coding techniques capable of generating encoded representations which are robust against channel losses play an important role in enabling reliable voice communication over packet networks and mobile wireless systems. In this paper, we investigate the use of multiple description index assignments (MDIAs) for loss-tolerant transmission of line spectral frequency (LSF) coefficients, typica...

متن کامل

Power SystemAnalysis for Nonsinusoidal Steady State Studies Based onWavelets

In this paper power system model is represented in a new domain that relates to Multi-Resolution Analysis (MRA) space. By developing mathematical model of elements in this space using Galerkin method, a new alternative method for power system simulation in nonsinusoidal and periodic conditions is developed. The mathematical formulation and characteristics of new proposed space is expressed. Als...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010